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ABSTRACT 

Messenger RNA is a key component of an intricate 
regulatory network of its own. It accommodates 
numerous nucleotide signals that overlap protein 
coding sequences and are responsible for multiple 
levels of regulation and generation of biological 
complexity. A wealth of structural and regulatory in- 
formation, which mRNA carries in addition to the 
encoded amino acid sequence, raises the question 
of how these signals and overlapping codes are 
delineated along non-synonymous and synonymous 
positions in protein coding regions, especially in 
eukaryotes. Silent or synonymous codon positions, 
which do not determine amino acid sequences of 
the encoded proteins, define mRNA secondary 
structure and stability and affect the rate of transla- 
tion, folding and post-translational modifications of 
nascent polypeptides. The RNA level selection is 
acting on synonymous sites in both prokaryotes 
and eukaryotes and is more common than previ- 
ously thought. Selection pressure on the coding 
gene regions follows three-nucleotide periodic 
pattern of nucleotide base-pairing in mRNA, which 
is imposed by the genetic code. Synonymous pos- 
itions of the coding regions have a higher level of 
hybridization potential relative to non-synonymous 
positions, and are multifunctional in their regulatory 
and structural roles. Recent experimental evidence 
and analysis of mRNA structure and interspecies 
conservation suggest that there is an evolutionary 
tradeoff between selective pressure acting at the 
RNA and protein levels. Here we provide a compre- 
hensive overview of the studies that define the role 
of silent positions in regulating RNA structure and 



processing that exert downstream effects on 
proteins and their functions. 

INTRODUCTION 

Sequencing of multiple genomes in recent decades revealed 
that the number of protein-coding genes in multicellular 
organisms is surprisingly low compared with the variety of 
biological functions performed by these proteins and the 
resulting physiological and morphological complexity of 
higher eukaryotic species (1^). Such a major increase in 
functional complexity is largely generated at two funda- 
mental levels: (i) transcriptional and post-transcriptional 
control that regulates differential gene expression, alterna- 
tive transcription and splicing, and (ii) post-translational 
modifications that affect protein structure, function and 
metabolic fate, and facilitate a large variety of functions 
performed by these proteins in vivo (5-8). Prominently, 
events that occur in between these two levels of regulation 
and involve all the steps that lead from mRNA to protein 
have not been factored into this complexity in earlier 
studies. 

Until recently, mRNA has been viewed solely as a 
carrier of the genetic code, transmitting information 
about the primary amino acid sequence from genes to 
proteins. Recent studies reveal a surprisingly important 
role of mRNA in the regulation of biological complexity. 
As we now know, mRNA is a key component of an intri- 
cate regulatory network of its own, which is different 
from the networks and pathways involved in DNA and 
protein regulation. Eukaryotic organisms carry multiple 
regulatory and structural signals in mature mRNA and 
pre-mRNA, dehneated along the protein-coding and 
non-coding regions in complex overlapping manner. 
The key provision that enables mRNA to carry these regu- 
latory functions is the redundancy of the genetic code that 
allows for many synonymous nucleotide substitutions that 
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do not change amino acid sequences of the encoded 
proteins and therefore often called 'silent' mutations. 
Synonymous nucleotide substitutions due to mutagenesis, 
errors in splicing and RNA editing can confer dramatic 
differences to the structure and function of mRNA itself 
that provide diverse possibilities for the regulation of gene 
expression patterns (7-10). Within the protein-coding 
regions (CDSs), the redundancy of the genetic code 
allows for the overlap in encoding amino acid sequences 
and RNA functional and structural signals, especially at 
the key structural and reference sites, such as the vicinity 
of the start and stop codons (10) as well as the exon- 
intron boundaries (11). The question to what purpose 
and extent do the genomes exploit their non-coding 
potential is still open (4,12). 

There are several well-documented ways in which syn- 
onymous sites exert their impact on gene functions: effect 
on mRNA splicing, mRNA folding, stabihty and regula- 
tion of translation through utilization of preferred 
synonymous codons that translate more efficiently and 
accurately. Additional and sometimes opposing selective 
forces appear to affect codon frequency as well. Previous 
findings show roles for synonymous positions in RNA- 
RNA interactions, which influence the translation 
efficiency, and in RNA-RNA cross-talk, which is a key 
to biological regulation of expression and transcriptome 
complexity (13-15). Emerging evidence shows that 'silent' 
substitutions carry a wealth of information, which is 
written over the encoded amino acid sequence, and that 
this information can be used to regulate translation speed, 
protein homeostasis, metabolic fate and even post- 
translational modifications, which will be discussed in 
this review. Here we will focus on the RNA level of regu- 
lation and the role of synonymous sites and mRNA struc- 
ture in generating biological complexity. 

SYNONYMOUS SITES AND CODON USAGE 
AFFECT GENE EXPRESSION 

Although the genetic code is generally conserved among 
organisms, synonymous codons in different species are 
used with different frequencies — a trend commonly 
defined as codon usage bias. Codon usage bias reflects 
selection for optimization of the translation process by 
tRNA abundance in many organisms. However, other dif- 
ferent factors such as GC nucleotide composition (16), 
RNA stabihty and folding (10), local RNA secondary 
structures (17), mRNA longevity (18), protein structure 
(19), compositional strand bias (20) and strand asymmetry 
induced by transcription-coupled repair (21) have also 
been proposed to affect nucleotide preferences at syn- 
onymous sites (9,22). Some of these factors are universal, 
whereas other factors act at specific levels of biological 
organization or under specific conditions. 

Synonymous sites are not neutral 

The neutral theory maintains that codon preferences exist 
because of the differences in codon mutabihty and most 
synonymous mutations spread to fixation by chance, and, 
therefore, have no effect on the fitness of organisms (23). 
However, a new wave of evidence for widespread selection 



pressure on the nucleotide level in the eukaryotic genomes 
and demonstration of the importance of synonymous 
positions for regulation of translation and splicing 
(8-10,24-29) cast doubt on the statement of the neutral 
theory (23). These observations support the theory that 
synonymous positions are under selection and codon 
bias is maintained by a balance between selection, muta- 
tion and genetic drift (30-32). 

GC content is a significant feature affecting codon pref- 
erences in different organisms (11,25,28). Across many 
species (675 bacteria, 52 archea and 10 fungi), the differ- 
ences in codon usage can be predicted from the nucleotide 
content of their non-coding sequences (33). However, 
GC content is determined not solely by genome-wide 
requirements, but also by selective forces that act on the 
coding regions (22). Indications of selection on synonym- 
ous positions were noted in Drosophila melanogaster and 
Caenorhabditis elegans, where most of the third positions in 
optimal codons contain a cytosine or guanine (32). 
Similarly, codon usage in mammals is obviously non- 
random due to elevated frequencies of G and C at synonym- 
ous sites (9,34). In different species, greater GC content at 
synonymous positions in the coding regions compared 
with the flanking introns could indicate selection at syn- 
onymous sites (34,35) and could be considered as a major 
factor of evolution (see 'Evolution' section). A pattern of 
polymorphism in GC-rich human genes, which is unex- 
plained in the framework of the mutation bias hypothesis, 
is consistent with the action of natural selection or biased 
gene conversion (36). In mammals, synonymous sites 
within the first exons are more GC-rich than within the 
last exons of the genes, a feature, hkely, relevant to trans- 
lation regulation, whereas there is no difference between 
GC contents of first and last introns of genes (34). 
Different patterns in codon bias have also been observed 
at the beginning and at the end of bacterial genes (37). 

At the mRNA level, synonymous positions were found 
to control folding, stabihty and secondary structures 
of mRNAs in different organisms and affect translation 
efficiency and post-translational regulation through 
mRNA-RNA and mRNA-protein interactions. Some of 
these structural and regulatory RNA features are defined 
by local nucleotide content, and codon preferences 
within specific genes, as well as across genes within the 
genome (9,10). 

It is weU estabhshed that synonymous codons are used 
non-randomly and can drive translational selection and 
affect codon preference in many organisms (22). It is 
difficult to explain by mutational pressure alone why 
preferred codons are recognized by more abundant 
tRNA molecules, or how the strong variabihty of codon 
bias across genes within the genome is maintained, where 
more pronounced codon usage bias is characteristic for 
highly expressed genes. The level of gene expression cor- 
relates strongly with codon bias in many prokaryotes and 
eukaryotes, while co-expressed genes have similar syn- 
onymous codon usages within the genomes of human, 
yeast, worms and bacteria (38). These observations 
suggest a role for synonymous position in the regulation 
of translation and support the notion that synonymous 
positions are not neutral. 
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Codon usage and selection for translation efficiency 
and accuracy 

Expression level is an important determinant of protein 
evolution rates (39,40), and translational selection is one 
of the most important driving forces in evolution (22). 
Earlier studies considered codon selection for maximiza- 
tion of the translational efficiency under conditions when 
selection favors rapid translation and the relevant iso- 
acceptor tRNAs might not be equally abundant (22). 
Under such conditions, a pressure exists to use the 
codons that match the most abundant tRNAs to facilitate 
translation. Utilization of common or rare codons can 
significantly affect the rate of ribosome translocation 
through mRNA, as the limited availabihty of the corres- 
ponding aminoacyl-tRNAs is expected to cause delays and 
ribosome stalling at the rare codon sites. Differential codon 
usage is associated with varying expression rates in many 
organisms (9). Positive correlation between codon usage 
bias and gene expression level was established in bacteria 
(41,42) yeast (41), nematode (43) and insect (44). As 
expected, bias in favor of preferred codons is more 
pronounced in highly expressed genes and mostly 
observed in prokaryotic species with large populations, 
although some prokaryotes do not show any clear signs 
of selection for translation efficiency (45). 

Recent experiments led to the conclusion that redun- 
dancy in the genetic code allows translation of synonym- 
ous but differentially coded mRNAs at different rates, 
even with fixed tRNA usage (46^8). Codon usage can 
significantly affect the speed of translation elongation in 
bacteria. In Escherichia coli, the rate for aminoacyl-tRNA 
association with different codons spans a 25-fold range 
and preferred codons accept aminoacyl-tRNAs faster 
than more rarely used codons (49). The use of common 
codons can increase the rate of translation elongation 
several folds, compared with the rare ones (50). In 
bacteria, codon usage represents an adaptation in those 
species that undergo rapid environmental changes and has 
been directly hnked to changes in protein expression (38). 
In some fungi, natural selection also generally favors 
optimal codon variants, but fixation of optimal codons 
is reduced in rapidly evolving long genes (51). 

A more complex picture emerges in mammahan species, 
where evidence supporting translational selection of 
codon choice is arguable (9,52,53). Experimental 
evidence was reported that tRNA content in rabbit reticu- 
locytes is specialized for the synthesis of hemoglobin, 
which constitutes >80% of total protein expression in 
these cells (54,55). However, no correspondence between 
the usage of a codon in human protein-coding sequences 
and the abundance of iso-accepting tRNA has been found 
in several studies on the genome level (32,56-59). It was 
shown that translation selection, when co-adaptation of 
specific tRNA gene copy numbers and codon usage 
across genomes considered, is more than 10 times lower 
for mammalian than for non-mammahan organisms, such 
as E. coli, yeast and worms (52). Only a weak correlation 
was found between expression level and frequency of 
optimal codons for human genes (60). Similarly, a weak 
correlation between levels of gene expression and amino 



acid composition, accountable for ~10% of the variation 
in expression levels, was reported recently for mouse 
protein-coding genes (61). This is not surprising, as the 
identity and diversity of the optimal codons in mammahan 
genomes is determined largely by the majority of genes, 
on which selection is much weaker, whereas selection for 
the use of optimal codons is strongest in highly expressed 
genes (33). 

When most of the genes seem to be under selection to 
increase usage of the preferred codons, some genes undergo 
opposite selection (62). There is an advantage to use rare 
codons in certain positions where they have a potential 
to slow down translation rate, especially at the elongation 
stage, because of the relatively longer time of rare tRNA 
dehvery (46). Rare codons are biased in lowly expressed 
genes in several genomes, including humans (60). In hne 
with this, different protein structural elements are 
associated with specific codon usage: a-helical regions are 
enriched by common (fast translated) codons, whereas dis- 
ordered and P-sheets structures are mostly encoded by 
rare codons (63). Thus, rare codons hkely provide an 
opportunity for translation pause and allow the translated 
segment of the protein to be folded properly without 
potentially interfering with the downstream segments that 
have not been translated yet (64). 

Selection on codon bias may also increase translation 
accuracy (65) because selection favors optimal codons 
at sites where changes are most hkely to disrupt protein 
functions (44). Significant association of evolutionary 
conserved regions with optimal codons was found in 
many different species on the transcriptome level 
(65-67). Some studies suggest that selection for translation 
accuracy might be required to prevent protein misfolding 
errors leading to the loss of functional protein molecules 
(65). This idea is supported by the observation that buried 
amino acids, responsible for protein folding, are preferen- 
tially encoded by more optimal codons, compared with 
surface residues, which participate in intermolecular inter- 
actions (68). 

Determination of the roles of synonymous positions on 
the multiple levels of protein regulation is a highly dynamic 
rapidly emerging field. Notably, these roles appear to be 
different in prokaryotes and eukaryotes. It is clear that 
protein-coding sequences in higher eukaryotes require di- 
versification for functional integrity, and this is achieved 
by the use of different codons in their variable and consti- 
tutive regions through different selection mechanisms (69). 
Thus, a vast body of recent evidence demonstrates that 
nucleotide preferences in synonymous positions contribute 
to the efficiency and accuracy of protein expression, and 
a bias for preferred synonymous nucleotides is generated 
and maintained by selection (22,31,32,70). 

More than codon usage 

A recent study reviewing codon usage bias in hundreds 
of prokaryotic genomes revealed that this bias is highly 
variable in different prokaryotes, ranging from high 
degrees of differential use of synonymous codons among 
different genes to virtually none (71). As mentioned pre- 
viously, this parameter was found to correlate with the 
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range of habitats for particular organisms: those with the 
necessity to adapt to a variety of environments (including 
pathogens) demonstrated a higher extent of codon usage 
bias compared with those organisms that live only in a 
particular habitat. Thus, in prokaryotes, codon usage 
appears to represent an adaptation measure that can 
regulate the overall ability of the organism to undergo 
rapid changes under the pressure of each particular envir- 
onment (71). Perturbing the codon usage directly affects 
the level or even direction of changes in protein expression 
in response to environmental stimuli. It has been shown 
for different prokaryotic and eukaryotic species that 
codon usage is universally function-specific and cells 
may need to dynamically alter their intracellular tRNA 
composition to adapt to their new environment or adopt 
a novel physiological role (38). 

Apart from mRNA, translation efficiency depends on 
another essential player: tRNA. Transport RNA gene 
content is a key factor that defines the efficiency of the 
translation machinery. Remarkably, repertoire of tRNA 
genes varies greatly between different organisms (72-75). 
Certain tRNA species are absent in entire branches of the 
phylogenetic tree, whereas others are clearly predominant. 
For example, in Homo sapiens, 29 of the 43 tRNAAla genes 
(68%) correspond to the iso-acceptor tRNAAlaAGC. 
Similar relationships were reported for bacterial species, 
and the underlying reasons are poorly understood. A 
recent study tracing the correlation between two tRNA 
modifications in base 34 of the anticodon that increase 
codon-pairing abihty, mediated by tRNA-dependent ad- 
enosine deaminases and uridine methyltransferases (76), 
found that the emergence of these modifications likely 
played a role in shaping of genomes and directing evolution 
of many species (77). Comparison of more than 500 differ- 
ent genomes showed that these two modifications likely 
define patterns of gene expression that correlate with the 
separation of living organisms into archaea, bacteria and 
eukaryotes (77). This study presents an entirely different 
angle in viewing the relationship between coding sequence 
and gene expression, and defines a novel feature of pro- and 
eukaryotic codon usage driven by tRNA modifications. 

Moreover, not only codon usage, but also codon 
context or the positioning of the particular codons in 
relation to their neighbors (i.e. codon pair usage) is 
subject to evolutionary pressure and apparently plays an 
important role in mRNA translation. Comparison of 
codon context for multiple genes in several eukaryotic 
species showed that both synonymous and non- 
synonymous mutations are selected to maintain context 
biases (78). These data are in agreement with an observa- 
tion that the amino acid replacement changes can disrupt 
the codon context sufficiently to increase the probabihty 
of fixation of subsequent silent changes in adjacent codons 
(79). In vivo studies provided evidence for the role of 
codon context in decoding fidehty and efficiency in differ- 
ent organisms, suggesting that codon context modulates 
evolution of the primary nucleotide sequence in the 
protein-coding genes and fine-tunes the structure of the 
open reading frames to ensure fidelity and efficiency of 
genome architecture (10,80). 



In summary, many factors determine the choice of 
codons, and selection on the codon bias likely acts at 
both the transcriptional and translational level. tRNA 
relative abundance, modifications and codon usage could 
drive each other to synergistically optimize the efficiency 
of gene expression. Elevated GC content of synonymous 
positions in many eukaryotic and prokaryotic genomes 
suggests that the RNA-level selection pressure contributes 
to codon preferences. Local codon context or positioning 
of particular codons in relation to their neighbors also 
might help to accommodate diverse regulatory signals and 
RNA structural elements in the protein-coding regions. 

Unhke prokaryotes, eukaryotic organisms appear not 
to use codon usage bias as a dominant mechanism of regu- 
lation of protein expression. Instead, codon preferences 
are used to accommodate diverse regulatory elements re- 
sponsible for the variability of molecular and cellular 
mechanisms and to provide new level of the biological 
complexity, especially in protein-coding regions of higher 
eukaryotes. 

ROLE OF SYNONYMOUS POSITIONS IN mRNA 
FOLDING, STABILITY AND PROTEIN FATE 

mRNA secondary structure and regulation of translation 

In 1972, White et at. (81) suggested that redundancy in the 
genetic code permits extensive variation of the nucleotide 
sequence and satisfies the requirements for both protein 
and RNA structure. Fitch (82) found first evidence that 
degeneracy of the genetic code is used to optimize base- 
pairing in mRNA molecules. Since then, the idea that re- 
dundancy of the genetic code allows preservation of 
mRNA folding has been supported by several fines of 
evidence that are discussed in this and following sections. 

Single-stranded mRNA molecules form secondary struc- 
tures through complementary self-interactions. Formation 
of RNA structures is dependent on the primary nucleotide 
sequence and folding environment, and is often defined by 
the longer-range interactions between the nucleotides. 
Evolutionarily conserved local secondary structures were 
described in eukaryotic and mammahan mRNAs and pre- 
mRNAs (83). Synonymous substitutions affect mRNA 
translation in different organisms (41,50,84). They can 
induce significant changes in the mRNA folding, causing 
formation of new stable hairpin loops and elements of 
higher-order folding. Recent studies suggest that the place- 
ment of stable structural elements within the niRNA 
sequence is far from random, and propose that transient 
ribosome stalling at key mRNA regulatory sites can affect 
protein abundance, folding and even post-translational 
modifications, as is discussed in the following sections. 
Stable structural elements can significantly affect transla- 
tion initiation and ribosome translocation, inducing 
ribosome pausing and stalhng that could considerably 
delay the overall progress of protein synthesis and 
folding of nascent polypeptides. Strong mRNA secondary 
structures formed due to gene-specific codon usage have 
been implicated in discontinuous translation and pauses 
in synthesis of insect silk fibroin, chicken collagen and 
other proteins (85,86). Although stable secondary 



Nucleic Acids Research, 2013, Vol. 41, No. 4 2077 



structures capable of interfering with translation are gen- 
erally avoided in niRNA coding regions (87), significant 
biases in favor of local RNA structures have been found 
in several bacterial species and yeast (17). Native mRNAs 
have a lower calculated folding free energy than random 
sequences (88), and correlations between mRNA and 
protein secondary structures have been noted (19). It was 
suggested that elevated C content at the third synonymous 
sites that stabilize RNA secondary structures (89) creating 
translational pauses is driven by usage of different encoded 
amino acids in alpha-helices, beta-sheets and disordered 
structures, which require different folding time. This phe- 
nomenon is associated with differential codon usage, as 
discussed in the previous section. 

Periodic pattern of mRNA folding in protein-coding 
regions 

Pronounced periodic pattern of mRNA secondary struc- 
ture, stabihty and nucleotide base-pairing was found in the 



mammahan coding regions (Figure 1). This pattern is 
created by the structure of the genetic code, and the 
relative abundance of dinucleotides is important for its 
maintenance (10). Although synonymous codon usage 
contributes to this pattern, even in the absence of codon 
bias, such pattern can be observed at the degenerate codon 
sites. While all codon sites are important for the mainten- 
ance of mRNA secondary structure, degeneracy of the 
code allows regulation of stabihty and periodicity of 
mRNA secondary structure. Synonymous codon sites 
contribute most strongly to mRNA stabihty, and base- 
pairing at the third codon positions is significantly 
higher than at other codon sites in mammahan transcrip- 
tomes (Figure 1). Similar periodicities of mRNA stabihty 
were theoretically predicted in bacterial, yeast, worm and 
fly transcripts (90). These results convincingly support the 
hypothesis that redundancies in the genetic code allow 
transcripts to satisfy the requirements for both protein 
and RNA structure. The RNA-level selection on 
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synonymous positions maintains a more stable and 
ordered mRNA secondary structure, which is likely to be 
important for the transcript stabihty and translation (10). 

Recent application of Parallel Analysis of RNA 
Structure (PARS) at single-nucleotide resolution to 
profiling of mRNA secondary structure in budding yeast 
Saccharomyces cerevisicie confirmed in silico predictions of 
the three-nucleotide periodicity of secondary structure 
across the coding regions and the existence of a more 
stable secondary structure in the coding versus untrans- 
lated regions (91,92). 

mRNA secondary structure in the vicinity of the start and 
stop codons 

Genome-wide analysis of eukaryotic mRNAs revealed 
distinct patterns of evolutionary conservation at the 
boundaries of the untranslated and coding regions. 
Conservation patterns at the synonymous positions in 
eukaryotes are more pronounced at the ends of the 
CDS, in the vicinity of the start and stop codons 
[Figure 1, (93)]. Elevated sequence conservation at syn- 
onymous positions likely reflects increased selection 
pressure on the structural features in these regions. The 
start and stop codons of mammahan transcripts mostly 
reside in the unpaired regions of evolutionary conserved 
mRNA stem-loop structures (10). At the same time, 
functional mRNA domains (5'-UTRs, CDSs and 
3'-UTRs) preferentially fold onto themselves, with likely 
cross-domain (UTR-CDS) interactions in their vicinity. 
Such distinct folding patterns and placement of the start 
and stop codons into relaxed structures likely facihtate 
efficient initiation and termination of translation (10). 
This trend of relaxed mRNA secondary structure near 
translation start codon was confirmed in other eukaryotic 
and prokaryotic species (93,94). This is a characteristic 
feature of highly expressed secretory proteins that tend 
to have relaxed secondary structure within the first 
30 bases of their open reading frames (92). An anti- 
correlation between the mRNA translation efficiency and 
the stability of the structure in the vicinity of the transla- 
tion start site was experimentally confirmed in yeast (92). 

The effect of mRNA folding on the rates of translation 
initiation and protein expression level was studied in 
E. coli. Expression of coding variants of the green fluor- 
escent protein in a synthetic library of 154 genes that 
varied randomly at the synonymous sites, but had the 
same amino acid sequence, showed 250-fold variations 
in protein expression levels (95). Stabihty of mRNA 
folding near the ribosomal binding site appeared to be 
the defining factor that could explain more than half of 
the variation in the protein levels, whereas codon usage 
bias did not correlate with gene expression. The results of 
this analysis suggest that mRNA folding and associated 
rates of translation initiation play an important role in 
shaping protein expression levels. Experimental studies 
of individual genes support in silico predictions and dem- 
onstrate the importance of the mRNA folding in the 
vicinity of the start codon. An interesting example 
involves catechol-O-methyltransferase (COMT) (96), a 
major enzyme controlling catecholamine levels that plays 



a central role in pain perception and cognition (97). One 
of the common, in the human population, COMT haplo- 
types carries the non-synonymous variation C(166)T 
within the upstream coding region of the RNA transcript. 
This haplotype codes for a less stable protein that exhibits 
an elevated protein expression in vitro (97), which would 
compensate for lower protein stabihty. It appears that 
structural destabilization near the start codon in the T 
allele mRNA could be related to the observed increase 
in the COMT expression. Folding simulations of the 
tertiary mRNA structures demonstrate that this destabil- 
ization lowers the folding transition barrier, thus 
decreasing the probabihty of occupying its native state. 
These data suggest a structural mechanism whereby 
functional synonymous variations near the translation 
initiation site affect translation efficiency through 
entropy-driven changes in mRNA dynamics and present 
an example of stable compensatory genetic variations in 
the human population. 

Another case of the structure-dependent regulation 
involves mRNA sequences encoding leader peptides. 
Although traditionally it has been believed that the sole 
purpose of the leader sequences is to target proteins to the 
appropriate intracellular destinations, recent studies 
suggest that the leader sequence carries information on 
RNA secondary structure in the translation initiation 
region that may help to control the rate and speed of 
translation initiation. This is illustrated with yeast cyto- 
chrome oxidase subunit II (Cox2p) mRNA, whose 
upstream codons contain antagonistic control elements 
fine-tuning the translation: the positive control element 
includes the first 14 codons specifying the leader peptide, 
whereas the negative control element is contained within 
codons 15 to 91. These regulatory elements embedded in 
the translated COX2 mRNA sequence, together with 
trans-acting factors, could play a role in the couphng of 
regulated synthesis of nascent pre-Cox2p polypeptide to 
its insertion in the mitochondrial inner membrane (98,99). 
We expect that such mechanisms of translational control 
may be common, and other interesting cases will be 
reported in future studies to encompass a wide variety of 
proteins containing leader peptides. 

RNA stability and protein abundance 

Synonymous substitutions may affect translation by 
facihtating stable loops that can significantly delay trans- 
lation initiation and/or ribosome translocation, or by 
loosening mRNA secondary structures and eliminating 
obstacles to speedy translation (8,29,95). Such mRNA- 
structure-dependent changes in translation rates can 
have dramatic effects on protein abundance and predis- 
pose to disease development. For example, a correlation 
was found between the vulnerability to myogenous tem- 
poromandibular joint disorder and synonymous 
mutations in the human COMT gene, which has been dis- 
cussed in the previous section (11,29). Synonymous sub- 
stitutions in three common COMT haplotypes result in 
the formation of different stem-loop structures in the 
middle of the protein-coding region, and the stability 
of these structures inversely correlates with the amount 
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of translated protein, leading to significant differences in 
the level of COMT enzymatic activity in vivo. Synonymous 
substitutions in the COMT coding sequence substantially 
influence pain sensitivity and the risk of developing tem- 
poromandibular joint disorder by affecting expression of 
this key protein regulator of pain perception. 

Another example of naturally occurring synonymous 
mutations that affect mRNA stability and protein 
synthesis was described for the human dopamine 
receptor D2 (DRD2) gene (100). Synonymous variant 
C957T, rather than being 'silent', altered the predicted 
niRNA folding, led to a decrease in mRNA stabihty and 
translation and dramatically changed dopaniine-induced 
upregulation of DRD2 expression. Variant GllOlA did 
not show an effect by itself but annulled the aforemen- 
tioned effects of C957T, demonstrating that combinations 
of synonymous mutations can have functional conse- 
quences drastically different from those of each isolated 
mutation. These results provide insights into mechanisms 
of molecular population genetics of diseases with complex 
inheritance and indicate that synonymous variation can 
have effects of potential pathophysiological and 
pharmacogenetic importance. Doubtless, these enzymes 
are only several examples among the potential many 
(101) that may be regulated through this mechanism. 
Other examples for many proteins are emerging in some 
of the ongoing studies partially discussed elsewhere in this 
article. 

Native mRNAs have a lower calculated folding free 
energy than random sequences, and the average folding 
energy and AG of dinucleotide interaction are signifi- 
cantly lower for abundant transcripts relative to rare 
ones (10). There is no direct hnk, however, between the 
thermodynamic stability of transcripts and their decay 
rates that are controlled by complex cellular mRNA 
decay systems using arrays of RNA-binding proteins 
and specific nucleases. There is abundant experimental 
evidence that the steady-state levels and decay rates of 
bacterial and mammalian mRNA strongly depend on 
the usage of synonymous nucleotides. Certain dinucleo- 
tides, for example, the across-codon dinucleotide T|A, 
are strongly avoided in both prokaryotes and eukaryotes, 
owing to fast enzymatic degradation of UA-rich mRNA 
species [reviewed in (102)]. 

mRNA structure, post-translational modifications 
and regulation of protein folding 

Recent studies demonstrate that variations in translation 
speed induced by mRNA secondary structures can lead 
to changes in post-translational modifications of the 
nascent polypeptide, a level of protein regulation that 
was previously believed to be unconnected with the 
RNA level regulation. An example of translation- 
dependent regulation of post-translational arginylation 
was recently shown for actins (103), abundant proteins 
represented by six gene copies in higher vertebrates that 
are nearly identical at the amino acid level but are encoded 
by different synonymous codons. It has been a subject 
of long-term debates in the actin field why mammahan 
genomes encode six highly similar actin proteins, and 



why do all these proteins appear to be only minimally 
redundant despite their near identity at the amino acid 
level. Non-muscle beta- and gamma-actin, two prevalent 
non-muscle actin forms that often coexist in the same cefl 
in nearly equal levels, are differentially modified by post- 
translational arginylation that affects only beta-actin and 
regulates its function in the cell motihty (104). 
Surprisingly, this difference in post-translational modifica- 
tions appeared to be regulated entirely through mRNA, 
which differs by ~12% between beta- and gamma-actin 
(103). Gamma-actin mRNA forms a stable secondary 
structure at the translation initiation site, whereas beta- 
actin mRNA is relatively unstructured in that region, re- 
sulting in a significant reduction in the translation speeds 
for gamma-actin compared with beta-. Although this dif- 
ference does not significantly affect the overall protein 
abundance, it appears to selectively affect post-transla- 
tionally modified states, causing slower folding of 
gamma-actin due to ribosome pausing and thus making 
it vulnerable to ubiquitin conjugation machinery attracted 
by co-translational arginylation. As a result, arginylated 
gamma-actin is selectively removed and never found in 
cells, whereas arginylated beta-actin, which escapes 
this degradation due to rapid synthesis and folding 
(103), accumulates in the cell (Figure 2). Thus, in the 
case of actin, synonymous codon-mediated changes 
in the mRNA secondary structure can lead to signifi- 
cant differences in protein translation rates and thus 
affect not only protein homeostasis but also post- 
translational modifications. It appears hkely that such 
mechanism can also be involved in achieving selectivity 
in post-translational modifications of otherwise similar 
proteins. 

Synonymous single-nucleotide polymorphisms within 
the same gene can create individual variations in transla- 
tion speeds, leading to dramatic effects on protein folding 
between individuals. A striking example of this kind 
concerns multidrug resistance 1 {MDRl or ABCBl) 
gene (105,106). In this gene, frequent-to-rare codon syn- 
onymous substitutions lead to the synthesis of proteins 
with identical primary structures but distinctly different 
folding patterns and varied intracellular functions. These 
differences are believed to be generated by ribosome 
stalling that, if it lasts long enough, can affect the 
protein folding and lead to alternate folding patterns. 
Although the conformational and functional differences 
between the native and alternate states may be minor, 
the MDRl case illustrates that the protein folding 
barriers may nevertheless constitute sufficiently high 
hurdles on the physiological time scales, leading to kinet- 
ically trapped states with altered structures and functions. 
Other related examples have been identified in disease 
and discussed elsewhere. Overall, like with other effects 
of synonymous positions on protein functions, these 
cases are likely to be the first of many. Considering the 
possibihty of selection against protein misfolding sup- 
ported by recent studies (63,64,68), it is likely that add- 
itional experimental evidence of the role of mRNA 
structure in determination of protein fate may be found 
in the near future. 
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Figure 2. Differential arginylation of actin isoforms is regulated by a novel degradation mechanism coupled to the translation and folding dynamics 
in vivo. Top, faster translation and folding of beta-actin protects the Lysl8 residue from potential co-translational ubiquitination and degradation on 
N-terminal arginylation. After emerging from the ribosome, arginylated beta-actin remains relatively stable and incorporates into actin cytoskeleton. 
Bottom, slower translation and folding of gamma-actin coupled with co-translational arginylation exposes arginylated gamma-actin for 
ubiquitination and ensures effective removal of 60-80% of arginylated gamma-actin protein. The fraction of arginylated gamma-actin that 
escapes the co-translational 'check point' is still degraded faster, with half-life of only 1 li, so that no arginylated gamma-actin can be detected 
in vivo. Image courtesy of Dr Fangliang Zhang. 



REGULATION OF TRANSLATION THROUGH 
RNA-RNA GROSS-TALK 

It has been long assumed that RNA-RNA interactions in 
the course of translation are limited to the classical codon- 
anticodon base-pairing between mRNAs and tRNAs, as 
weU as to interaction of ribosomal RNA (rRNA) with 
ribosome binding sites (RBS) on mRNAs in prokaryotes. 
Recent evidence suggests that interactions between chnger 
elements on rRNA molecules and complementary sites 
scattered along mRNAs are important factors in regula- 
tion of translation in both prokaryotes and eukaryotes. 
In prokaryotes, internal Shine-Dalgarno-hke sites in 
the coding mRNA regions may function as translation 
delay signals. In addition to better known factors, such 
as codon usage and mRNA secondary structure, the com- 
plementary base-pairing between mRNA and rRNA 
molecules may play an important role in controlhng 
protein synthesis (14,107). It was proposed that mRNA- 
rRNA cross-talk foUows the multiple contact model 
(Figure 3A and B) through formation of duplexes 
between short complementary sites scattered over se- 
quences (14,107,108). Universal occurrence of rRNA 
chngers in prokaryotes and eukaryotes suggests that this 
level of regulation was likely estabhshed early in evolu- 
tion. Strong G/C asymmetry of the coding strands, as 
weU as C-rich content of synonymous positions and 5'- 
UTRs in the vicinity of the start codon, might represent 
regulatory adaptations for a more efficient and fast 
translation. 

mRNA-rRNA cross-talk in prokaryotes 

Sequence analysis of 16S rRNA of E. coli identified 
multiple sites termed dinger elements or chngers that are 
complementary to the sites frequently occurring in 



mRNAs and tRNAs and represent potential regions of 
intermolecular hybridization, dinger sites and their com- 
plementary mRNA partners are highly conserved in E. 
coli and might also operate in other prokaryotes by 
base-pairing of the 16S rRNA in the 30S ribosomal 
subunit with mRNAs (107). Major dingers on 16S 
rRNA pair with abundant mRNA motifs and represent 
universal binding sites for transcripts that belong to 
different functional groups (Figure 3C and D). Notably, 
dingers with pronounced hybridization affinity to 
5'-UTRs of mRNAs are located in the 3'-end of 16S 
rRNA, where several G-rich high affinity dingers exist 
in addition to the classic anti-Shine-Dalgarno C-rich site 
(Figure 3C). Contrary, dingers complementary to mRNA 
coding regions are mainly located in the 5' and core 
regions of 16S rRNA, whereas hybridization affinity of 
the 3'-end of 16S rRNA to mRNA coding regions is rela- 
tively low [(107), Figure 3D]. These results suggest an 
adaptation of structural organization of the 16S rRNA 
molecule to mRNA sequences, and support the idea that 
RNA interactions with chngers may contribute to upregu- 
lation of the translation process through increase in local 
concentration of mRNAs and tRNAs in the vicinity of the 
ribosome and their proper positioning, or reduction in 
efficiency of translation through non-specific mRNA- 
16S rRNA interactions (107) or transient pausing of ribo- 
somes during translation (109). 

This concept is supported by recent experimental study 
where a minimal reconstituted E. coli translation system 
was used to identify efficient RBSs in an unbiased high- 
throughput manner (110). The authors appHed ribosome 
display, a powerful in vitro selection method, to enrich only 
those mRNA sequences that could direct rapid protein 
translation. In addition to canonical Shine-Dalgarno 
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Figure 3. The multiple contact model of mRNA-rRNA interactions. Hybridization affinity of 16S rRNA to mRNAs in E. coU. mRNA-rRNA 
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sequences in the coding region of mRNAs (109). Panels C and D are adapted from 107. 



motifs, they recovered highly efficient C-rich sequences in 
niRNA coding regions that exhibit unmistakable comple- 
mentarity to the 16S rRNA of the small subunit of the 
ribosome (Figure 3C), indicating that broad-speciticity 
base-pairing may be an inherent general mechanism of 
efficient translation. Furthermore, given the conservation 
of ribosomal structure and function across species, the 
broader relevance of C-rich RBS sequences is supported 
by multiple diverse examples in nature, including C-rich 
RBSs in several bacteriophages and plants, a poly-C con- 
sensus before the start codon in lower eukaryotes and 
Kozak-hke sequences in vertebrates (1 1 1). 

Recently Weismann and colleagues (109) reported a 
genome-wide study of ribosome pausing in E. coli and 
Bacillus suhtilis by ribosome profihng, a technique that 
allows the identification of ribosome-protected mRNA 
by high-throughput sequencing. Results of the study 
suggest that under nutrient-rich conditions, usage of rare 
codons does not lead to significant delays in translation. 



Rather, Shine-Dalgarno-hke sites within the coding 
sequences cause pervasive translational pausing, due to 
hybridization between the mRNA and the 16S rRNA of 
the translating ribosome. To avoid excessive pausing, 
internal Shine-Dalgarno sequences are disfavored in the 
protein-coding sequences, avoiding codons and codon 
pairs that resemble canonical Shine-Dalgarno sites. 
Such disfavor creates an inadvertent bias in codon usage 
and also contributes to elevated C-content in highly 
translated mRNAs. As natural environments, unhke 
experiinental conditions, often involve insufficient 
nutrient suppHes, it appears likely that nutrient starvation 
and/or specific nutrient deficiencies induce evolu- 
tionary adaptations to cause a downstream effect of 
ribosome pausing in the content-dependent manner, and 
thus, redundancy in the genetic code likely constitutes a 
genuine evolutionary tool that controls translation rates. 
Internal Shine-Dalgarno-hke sequences and C-rich 
RBSs are likely major determinants of translation rates 
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and a global driving force for the coding of bacterial 
genomes. 

mRNA-rRNA cross-talk in eukaryotes 

Intermolecular liybridization experiments demonstrated 
that human 5S rRNA and 18S rRNA molecules can 
hybridize with niRNAs during translation (112). 
Similarly, murine 18S rRNA and 28S rRNA form stable 
hybrid structures with mouse mRNAs, suggesting that 
such interactions could play a role in regulating translation 
speed. As discussed previously, mRNA may interact with 
rRNA through formation of duplexes between short com- 
plementary sites scattered over sequences to position 
mRNA properly for efficient translation (14,108). 
Sequence analysis identified multiple 18S rRNA dingers 
complementary to ohgonucleotides that frequently occur 
in both 5'-UTR and coding regions of mRNA and repre- 
sent potential hybridization regions (14). Many eukaryotic 
mRNAs contain sequences that resemble segments of 28S 
and 18S rRNAs, and these rRNA-like sequences are 
present in both sense and antisense orientations. For 
example, four potential 18S rRNA-interacting sequences 
were found in hundreds of different mRNAs, and the 
location of these sequences within the various genes was 



not random (113). The distribution of dingers along 18S 
rRNA sequence is universal for different mRNAs 
(Figure 4), and the affinity of dingers for mRNAs is 2-3 
times higher than for intron sequences and for randomly 
generated sequences with the same nucleotide content. 
There is a significant variabihty in the hybridization 
affinity between different mRNAs that suggests a 
possible role of rRNA dingers in translation processes as 
universal regions of mRNA binding that can affect trans- 
lation rates (14). Notable differences were found in the 
affinity of rRNA to the groups of abundant and rare mam- 
mahan mRNAs, as well as the prevalence of C-rich syn- 
onymous positions in the abundant mRNAs (9,93,114). 
Elevated C-content in mRNA synonymous sites likely rep- 
resents an adaptation mechanism that adds to upregulation 
of translation rates of abundant high-expression mRNA 
species. For example, the hybridization affinity of 18S 
rRNA chngers to abundant protein kinase transcripts 
was ~four-fold higher than for rare kinase transcripts 
[Figure 4, (114)]. 

The ability of several predicted dingers to interact with 
mRNA during translation was experimentally confirmed. 
There is evidence that mRNA sites interacting with rRNA 
may facihtate translation. A 9-nucleotide sequence from 



250000 



B, 200000 

< 

z 

IT 

E 
_c 

w 

g 150000 
E 

O) 
(D 

>. 
k_ 

(B 
C 

I 100000 

0) 

a. 
E 
o 
u 



0) 

n 
E 

3 



50000 



G-C 

-c I 
b-c 

G-C 

i-i 

G-C 

G-C 



C-G 
G-C 
,G-U 



WW 



■ Abundant mRNAs 
I Rare mRNAs 



un o. T- <COOCDCOCX3 " !S !S n: 2 S 

T- T- C\J I 1 I I T I I T T CO CO CO CO ^ ^ ^ 

=ic^5cDOC!XDO" 




1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 

18S rRNA 



Figure 4. mRNA-rRNA intermolecular hybridization affinity. Distribution of complementarity of mouse 18S rRNA to several thousand mRNAs. 
Peaks represent potential dingers on mouse 18S rRNA. Hybridization affinity of abundant and rare protein kinase transcripts to verified 18S rRNA 
dinger (right insert box). Predicted secondary structure of verified 18S rRNA dinger (left insert box). This figure is adapted from 14 and 114. 



Nucleic Acids Research, 2013, Vol. 41, No. 4 2083 



the 5' leader of the mouse Gt.x homeodomain mRNA 
faciUtates translation initiation by base-pairing to 18S 
rRNA. Role played by the Gtx element in translation in 
eukaryotes to some extent resembles the function of 
Shine-Dalgarno sequences in translation in prokaryotic or- 
ganisms (113,115,116). The presence of the Gtx element in 
various mRNAs suggests that this element may affect 
translation of different transcripts. Another sequence com- 
plementary to 18S rRNA is preferentially located within 
coding regions in multiple rodent genes immediately 
upstream of the termination codon. The effects of the 
sequence complementarity to 18S rRNA on translation 
were assessed using rodent mRNA encoding ribosomal 
protein SI 5. Mutations that decrease this complementarity 
without changing the amino acid sequence or affecting 
codon preference increase translation ~1.5 fold (13). 
It is likely that direct base-pairing of particular mRNAs 
to rRNAs within ribosomes may provide a mechanism 
of translational control that works in both directions, 
and dinger sites may function both as upregulating and 
downregulating elements. 

These and other studies allow a better understanding of 
the role of intermolecular RNA interactions in regulation 
of protein expression, and suggest that selection pressure 
on synonymous sites could be imposed by requirements to 
accommodate or avoid placement of RNA-RNA inter- 
action sites within protein-coding sequences, which may 
contribute to upregulation of the translation process 
through increase in the local concentration of 
mRNAs in the vicinity of the ribosome and their proper 
positioning, or reduce the efficiency of translation through 
transient pausing of ribosomes during translation 
(14,107-109). 



ROLE OF SYNONYMOUS POSITIONS IN THE 
OVERLAPPING CODES: EUKARYOTIC 
REGULATORY SIGNALS 

Messenger RNA carries numerous short regulatory se- 
quences, such as transcription factor binding sites, RNA 
editing and localization elements, splicing and translation 
initiation signals that often overlap with protein-coding 
regions. The repertoire of overlapping codes is particularly 
rich in eukaryotic coding regions that harbor regulatory 
signals involved in alternative transcription, splicing and 
nucleosome positioning (6,24), binding sites for diverse 
mRNA-associated proteins, microRNA (miRNA) target 
sites and other elements of RNA-RNA cross-talk 
(14,15,24). Selection pressure exerted on synonymous 
codon positions at such sites allows many degrees of 
freedom for evolution that might be used for achieving 
changes in regulation of biological function without modi- 
fications of protein sequences. Single-nucleotide changes 
at synonymous positions dramatically influence transcrip- 
tome repertoire and enrich structures of alternative 
isoforms expressed in different tissues and under different 
conditions (6,9,40,67). In this section, we will discuss the 
diversity of overlapping codes and regulatory signals in 
higher eukaryotes and their contribution to the complexity 
of transcriptome. 



Given the key roles RNA signals and structural elements 
play in multiple aspects of normal physiology and regula- 
tion of protein function, it is not surprising that aberra- 
tions and alterations in RNA signals and structures at the 
primary and secondary levels can lead to dramatic conse- 
quences to health and has been implicated in a number 
of human diseases through various mechanisms (117). 

miRNA-mRNA interaction and silencing influence 
synonymous codon choices 

Synonymous codons are widely selected for the needs of 
various biological mechanisms in transcription regulation 
in eukaryotes. Recent evidence suggests that miRNA 
function may affect synonymous codon choices in the 
vicinity of miRNA target sites that are commonly 
located in the coding regions of plant genes. A general 
trend of relieved structural accessibility around miRNA 
target sites was observed in four plant genomes (118). 
It was found that G- and C-rich codons are avoided in 
the regions flanking miRNA target sites, and this selection 
is stronger for GC-rich genes compared with the genes 
located in the GC-poor regions. The authors suggest 
that synonymous codons near miRNA targets are 
selected for efficient miRNA binding, and natural selec- 
tion on synonymous positions around miRNA target sites 
might, therefore, influence evolution of the coding regions. 
Similar selection may act on the coding regions in 
mammals and insects (119-122). Although the majority 
of characterized mammalian miRNA target sites are 
located in the 3'-UTRs, the large-scale studies show that 
they are also present and functional in coding regions and 
5'-UTRs (123-127). Targeting of sites harbored by the 
coding regions is generally less effective. However, they 
contain representative numbers of miRNA target sites 
that mediate notable repression, as demonstrated by 
genome functional studies (128-131). This conclusion is 
also confirmed by several research groups in experiments 
using reporter assays (132-134). 

Many of CDS-located target sites are conserved 
between closely related animal species (119,121,134). 
Coding regions of repeat-rich genes contain numerous 
potential target sites for particular miRNAs, and such 
genes are often strongly repressed. Such sequence 
repeats arise through evolutionary duplications and 
occur particularly frequently within famihes of C2H2 
class of zinc-finger genes (127). Efficient targeting of 
coding-region repeats is highly predictable, and due to 
the large number of target sites within a single CDS, 
downregulation observed in reporter assays can be 
stronger than for many genes with 3'-UTR targets. 

Synonymous mutations at the miRNA-binding sites 
disrupt target recognition and may be implicated in 
disease development. For example, synonymous poly- 
morphism in the human IRGM gene affects binding site 
for miR-196 and leads to tissue-specific deregulation of the 
/i?GM-dependent xenophagy that causes a predisposition 
to Crohn's disease (135). Thus, recent studies suggest a 
role of the coding regions, and, specifically, the coding- 
sequence repeats, in post-transcriptional regulation. 
Selection pressure on the synonymous positions might 
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affect synonymous codon choices in and around miRNA 
target sites in favor of higher accessibihty of miRNA 
binding. 

Splicing control imposed by mRNA folding and 
intermolecular interactions at exon-intron boundaries 

The majority of protein-coding genes in mammals 
undergo alternative splicing, whereby the same sequence 
belongs to an exon in one subset of transcripts of a 
given gene locus and to an intron in another subset of 
transcripts. Indeed, the latest estimates based on 
high-throughput transcriptome sequencing indicate that 
up to 95% of multi-exon human genes are subject to al- 
ternative splicing that involves ~ 100 000 major alternative 
events (136). Exceptionally wide spread of alternative 
initiation and alternative termination of transcription in 
the genome (6,137,138), coupled with independent alter- 
native splicing events in different regions of the same 
gene locus, can yield dozens of different transcript 
variants. Such combinatorial use of alternative exons 
represents a major source of transcriptome diversity in 
higher eukaryotes, especially in humans and other 
mammals, where it allows generation of hundreds of 
thousands isoforms from 30000^0000 protein-coding 
genes. 

Traditionally, pre-mRNA has been viewed as a passive 
molecule that is kept by hnRNPs in the unfolded and un- 
structured form to allow snRNPs and other proteins to 
scan the pre-mRNA for regulatory sequences and process 
it into mature transcript. However, this view has been 
largely reconsidered in light of transcriptome studies 
demonstrating that pre-mRNA itself is actively regulating 
its own processing (24,139). It is well established that 
RNA structural elements can directly inhibit or activate 
sphcing. Taking into account current data demonstrating 
that pre-mRNA can be actively spliced as it is being 
transcribed (140), it is obvious that not only local but 
also distant mRNA structural elements might be import- 
ant for efficient splicing. In many cases, distant and local 
signals (5' and 3' exonic splice sites or branch points) 
within coding regions have been found involved in 
mRNA structure formation. Ul, U2, U4, U5 and U6 
snRNAs participate in excising the major class of 
introns from pre-mRNAs (24). The secondary structures 
of these snRNAs are highly conserved from yeast to 
human, as are their nucleotide sites that are involved in 
intermolecular interactions. These conserved regions 
specify the roles of snRNPs and participate in the intricate 
RNA-RNA interaction network during spliceosome 
assembly and function (141). Taking into account that 
this complicated machinery contains many active players 
with short interacting sites, efficient cross-talk between 
RNA and protein molecules requires high accessibihty of 
pre-mRNA. Many individual cases of such interactions 
have been described in the literature with examples of 
pre-mRNA structures that inhibit or accelerate splicing 
(9,24,139,142). 

Sequence analysis and prediction of RNA secondary 
structures are useful tools in experimental design aimed 
at determination of pre-mRNA regulatory sites. 



Interspecies conservation of local RNA structures 
identified with co-variation base-pairing models that 
consider exchanges between paired dinucleotides in the 
structure (e.g. G-U change to A-U or G-C) may corres- 
pond to the functional signals of pre-niRNA processing. 
Exonic splicing enhancers and silencers, usually located 
near intron-exon boundaries and represented by oligo- 
meric motifs, are responsible for a cross-talk between 
RNAs and spliceosomal proteins to facihtate splice-site 
recognition (143,144). Selection pressure on such 
elements manifests itself with a high level of interspecies 
similarity of their conserved mRNA secondary structures 
(145), with a low frequency of polymorphisms in the 
paired regions and low density of SNPs at the ends of 
exons (143,144). Synonymous changes in exonic splicing 
enhancers and silencers could affect exclusion or retention 
of exons in mature transcripts. Recent reports have 
identified proteins and small molecules that can affect 
splicing by modulating RNA structures, thereby expand- 
ing our knowledge of the mechanisms of splicing regula- 
tion (24,139). 

RNA editing and protein recording 

RNA editing is a phenomenon that provides a mechanism 
for the alteration of particular nucleotides in RNA se- 
quences relative to their genomic templates, resulting in 
diversification of RNA sequences that consequently 
change their function (146,147). RNA editing has been 
found across all kingdoms of life, including viruses 
(148,149). A surprisingly large number of instances of 
RNA editing has been identified in humans using bioinfor- 
matics screens and high-throughput experimental investi- 
gations utilizing next-generation sequencing technologies 
(150). Analysis of RNA editing events in the human 
ENCODE RNA-seq data identified frequent editing of 
housekeeping genes involved in ceU division, translation 
and viral defense across multiple ceU types (151). 

RNA editing plays a variety of functional roles in regu- 
lation of gene expression. Editing of a nucleotide within the 
protein-coding region may change the identity of a particu- 
lar encoded amino acid or prematurely terminate the 
protein, create or deplete entire exons through changes 
in a sphcing site, cause retention of mRNA in the nucleus 
or miRNA modification, affect RNA stability, efficiency 
of protection against viral RNA, and heterochromatin 
formation (151). For example, RNA editing can lead to 
exonization of the Alii repeat in the nuclear prelamin A 
recognition factor. Exon 8 in this gene is derived from 
the recently exonized sequence of the Alu repeat, where a 
non-valid (AA) 3' splice site is edited to a valid AG and 
alternatively spliced in a tissue-dependent manner, leading 
to a higher transcript abundance in brain tissue than in 
skeletal muscle (152). The sequence of the new exon 
contains the in-frame TAG stop codon that is efficiently 
edited to TGG that code for tryptophane to keep the 
reading frame (152,153). When editing is needed within a 
protein-coding sequence, a region of the coding sequence 
usually forms base-pairing with intronic regions of the 
same gene. In this example, another Alu element 25 bp 
upstream to the exonized Alu is crucial for creation of 
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Alu-Alu duplex that is required for RNA editing (153). The 
selection force acting to maintain these interactions reduces 
evolutionary rates at synonymous positions of the sites im- 
portant for the duplex formation. RNA editing, occurring 
within intron-exon boundaries, can affect splicing, effect- 
ively resulting in the generation of alternatively spliced 
products (154). It also can change functioning of RNA 
structural elements related to the translation efficiency 
(155,156). Combinatorial editing is a significant contribu- 
tor to the transcriptome repertoire, suggesting that editing 
of synonymous positions, together with alternative 
exonization, adapted by natural selection, may serve as 
important mechanisms of transcriptome diversification in 
primates (157). 

RNA secondary structures are also involved in record- 
ing of protein sequences that change the meaning of par- 
ticular codons. One of the best studied examples is the case 
of selenocystein insertion, which is driven by mRNA sec- 
ondary structures known as SECIS (158). As in other 
cases, selective pressure acting on these structures slows 
down evolutionary rates for synonymous substitutions. 
Similar stable and conserved RNA structures are often 
required for frameshifting and stop codon readthrough 
that are common in viruses (159,160). The most promin- 
ent example of frameshifting is antizynie in eukaryotes 
(161,162). Frameshifting is triggered by niRNA-rRNA 
interactions and evolutionarily protected by selection 
pressure on silent sites. 

mRNA stability and decay 

Mutations, errors in transcription and splicing may create 
mRNA variants encoding abnormal proteins. mRNA can 
serve as a quaHty control template by ensuring that de- 
fective proteins, containing aberrant sequences that would 
result in premature functional truncations and/or other 
major abnormalities, do not get synthesized at all [re- 
viewed in (163)]. Three major mechanisms of mRNA sur- 
veillance and decay function in the nucleus and cytoplasm 
(164). Nonsense-mediated mRNA decay that exists in all 
eukaryotes detects and degrades transcripts that contain 
premature stop codons (165,166). Non-stop mRNA decay 
targets mRNA that lack a stop codon (167). No-go 
mRNA decay detects abnormally stalled ribosomes and 
cleaves transcripts with low translation efficiency near 
such stalled sites by endonucleases (168). Overall, tran- 
scripts with a range of abnormalities resulting in low 
translation efficiency (defined by low ribosome density, 
slow ribosome translocation and abnormal initiation 
rates) are specifically targeted by various mRNA decay 
complexes in vivo, extending this regulatory mechanism 
to the translation level (169). Such low translational effi- 
ciency arises through multiple mRNA features acting in 
concert and can result from low translation initiation rate, 
mediated by stable secondary structure and/or weak initi- 
ation sites, as well as low translation elongation speed, 
mediated by codon usage (169). 

mRNA turnover is a highly controlled process. In 
addition to a nonsense codon, specific downstream 
sequence elements are required for mRNA destabilization 
and degradation of abnormal nonsense transcripts. 



Sequence motifs enriched by pyrimidine can predict po- 
tential regions in mRNAs that together with the upstream 
nonsense codon promote rapid decay of its mRNA. It was 
also suggested that other sequence elements modulate the 
activity of the downstream element by forming RNA sec- 
ondary structures (170). 

Several sequence elements can regulate the rate of 
turnover of a transcript by promoting or by inhibiting 
decay through stabilizer or destabilizer elements, respect- 
ively. Most of these elements, such as the AU-rich sites, are 
located in 3'-UTRs, but also found in 5'-UTRs and coding 
regions (171). For example, both AU-rich site in 3'-UTRs 
and a destabilizing sequence known as the major protein- 
coding region determinant (mCRD) within c-fos mRNA 
coding region work together (172). The mCRD usually 
locates at least 450 nucleotides proximal to the poly(A) 
tail and requires continuing translation for the destabiliz- 
ing function. Transit of ribosomes through the mCRD 
element disrupts the complex and triggers the mRNA 
decay (173). In addition to the nonsense codon, specific 
downstream sequence elements enriched by pyrimidine 
are required for mRNA destabilization and, likely, 
modulate the activity of the downstream element by 
forming RNA secondary structures. Another example 
of translation-dependent instability element within the 
protein-coding region was localized in yeast MATal 
mRNA (174). Notably, this element corresponds precisely 
to an mRNA sequence previously shown to be complemen- 
tary to 18S rRNA. These results suggest a model where the 
triggering of MATal mRNA destabilization results from 
establishment of an interaction between translating ribo- 
somes and a downstream sequence element. 

niRNA decay mechanisms not only serve as important 
quality control checkpoints in the functioning of a normal 
cell, but also play a role as major disease barriers in organ- 
isms carrying recessive mutations that would cause protein 
truncations and/or major structural abnormalities that 
may result in dominant negative or gain-of-function 
effects at the organismal level. mRNA decay mechanisms 
also assist in degrading defective physiological transcripts 
and preventing the effects of routine inaccuracies that 
occur during transcription initiation, pre-mRNA splicing 
or transcriptional errors (166,175). Finally, it has been sug- 
gested that failure of mRNA decay mechanisms constitutes 
a strong drive for molecular evolution aimed to increase the 
overall robustness of genes to errors (176,177). 

As mRNA decay constitutes a key mechanism of 
the overall regulation of mRNA availability and protein 
synthesis, it is not surprising that a large role in disease 
and its prevention belongs to mRNA decay mechanisms. 
When combined with frameshift or nonsense mutations, 
they can result in premature termination of translation, 
often leading to deficiencies in critical proteins. Recent 
genome-wide association studies revealed a substantial 
fraction of synonymous substitutions linked to human 
disease risk and other genetic traits by mechanisms 
that are believed to be largely associated with alterations 
in translation rates and mRNA decay. Cases of 
p-thalassemia (178), cystic fibrosis (179), Duchenne 
muscular dystrophy (180) and a number of cancers have 
been found to be linked to RNA decay. Other examples 
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include somatic-cell rearrangement and hypermutation of 
immunoglobulin or T-cell receptor genes that generate 
immune diversity (181). 

mRNA localization and interactions with 
mRNA-associated proteins 

Another level of protein regulation by mRNA arises 
through mRNA-associated proteins that package mRNA 
into an mRNP complex and participate in multiple aspects 
of mRNA functions, including regulation of its stability 
and turnover, translation initiation and translation 
rates, as well as its distribution throughout the cell that 
ensures preferential translation of specific proteins at key 
functional sites. Much of this interaction has been 
characterized at the untranslated regions, either upstream 
(5'-UTR), where binding or release of specific proteins can 
mediate translation initiation, or downstream (3'-UTR), 
where RNA-protein binding can regulate various aspects 
of RNA folding and targeting to different complexes; 
however, some prominent examples of such elements 
within the mRNA coding regions have also been found. 
A particular type of such regulation involves 'localizer' or 
'zipcode' sequences, which are present in a highly specific 
subclass of mRNAs (182) and target them to key cellular 
destinations (183). Zipcode-mediated targeting requires 
specific zipcode-binding proteins (184) that associate with 
mRNA during transcription (185) and induce RNA 
looping in the process of recognition and binding (186). 
Such targeting has proven to be highly physiologically 
important and has been implicated in a wide range of bio- 
logical processes, including leading-edge activity in motile 
cells (182), axon guidance and growth cone activity 
(187,188), brain development (189), G protein signaling 
(190), cell polarity and chemotaxis (191) and many 
others. Moreover, it has been found that zipcode-binding 
proteins regulate mRNA stability during stress and prevent 
their premature removal (192). 

An important but less explored aspect of RNA and 
DNA protein binding arises through the likely impact 
of synonymous nucleotide substitutions on affinity and 
recognition by regulatory protein factors. A correlation 
between coding sequence and protein binding has been 
found at the nucleosome level, where nucleosome position- 
ing apparently defines the rates of coding sequence evolu- 
tion (193). Other studies found a correlation between 
nucleosome positioning and evolution of tandem repeats 
(194). It appears hkely that protein-nucleic acid binding 
should in turn be regulated by nucleotide sequence and 
represent an interconnected hierarchical chain driving 
protein expression and mRNA function. 

EVOLUTION: RNA-LEVEL SELECTION PRESSURE 
ON PROTEIN-CODING SEQUENCES 

RNA selection pressure and the Ka/Ks metric of 
amino acid selection pressure 

As discussed previously, evolutionary selection pressure is 
acting at both the protein-coding level and RNA or nu- 
cleotide level (32,195-199). Patterns of the RNA-level se- 
lective constraint are manifested by the elevated sequence 



similarity and base-pairing (or hybridization affinity), 
which is crucial for RNA secondary structure, stability 
and intermolecular interactions. These patterns are 
specific for transcripts of different functional groups. 
The functional importance of the RNA-level selection 
pressure has been exempUfied by the evidence of non-neutral 
evolution at synonymous sites and by the finding that alter- 
natively spliced exons in mammals are more conserved at 
their silent sites than constitutive exons (28). RNA selection 
pressure affects genome architecture at different levels of 
organization, as identified by conservation of local and 
global RNA secondary structures in mammalian 
pre-mRNAs and mRNAs (10,89,145). Two distinct biolo- 
gical manifestations of RNA selection pressure, related to 
RNA hybridization affinity, are seen in the coding regions; 
one associated with mRNA folding/stability and the other 
with mRNA intermolecular interactions. 

An important question is how one can accurately 
estimate RNA selection pressure. Evaluations of evolution- 
ary selection are based on the frequencies of substitutions at 
the non-synonymous and synonymous sites, termed Ka and 
Ks, respectively. The Ka/Ks ratio is generally accepted as a 
measure of evolutionary selection on protein-coding se- 
quences, where the frequencies of mutations observed at 
the non-synonymous and synonymous sites are: 
Ka = wpn, Ks = pfi and Ka/Ks = m, which is not depend- 
ent on p or |i (where w is amino acid selection pressure, p is 
RNA selection pressure and |i is mutation rate) (200,201). 

At first approximation, the RNA selection pressure 
could be described by a simplified model that considers 
all potential driving forces of the RNA-level selection as 
one independent variable, p (28). Ks, the key parameter 
for the estimation of the RNA-level selection pressure, 
could be measured accurately and independently of Ka, 
and specific classifications of different events at 
synonymous positions could be considered for the 
accurate estimation of p (200). A sensitive bioinformatics 
approach was recently suggested for identifying alterna- 
tively spliced exons with evidence of strong RNA selection 
pressure, where evolutionary selection against mutations 
changes only the mRNA sequence leaving the protein 
sequence unchanged (202). 

Best studied examples of the RNA-level selection are 
associated with translation selection on codon usage and 
selection pressure related to the regulation of pre-niRNA 
splicing. Selection on codon usage reduces the synonym- 
ous divergence rate, and may introduce a bias into the Ka/ 
Ks estimations. In some cases, when purifying selection 
(selection that ehminates a new mutation from the popu- 
lation, removing deleterious alleles from the population; 
also known as negative selection) on synonymous sites is 
strong, a very low Ks might be due to the presence of 
splicing regulatory signals (9). Evaluation of synonymous 
evolution in the regions with high Ka/Ks ratio and 
accurate estimation of Ks and Ka values are helpful to 
identify Ka peak zones or Ks dips to detect positive selec- 
tion (natural selection that promotes the spread of a new 
mutation through the population, resulting in a fixed dif- 
ference between species; also known as Darwinian selec- 
tion) in specific genome regions or sites (9,203). Detailed 
individual analysis of Ka, Ks and Ka/Ks values and 
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classification according to their ranges for regional and/or 
site-specific applications are essential for the development 
of accurate models of the specific RNA-level selection. 
However, the task to classify and model all different 
aspects of the RNA-level selection pressure is a challenge. 
Statistical test was developed for identification of purify- 
ing and positive selection at synonymous sites in the 
protein-coding genes (69). To measure selection on syn- 
onymous sites, the authors used the substitution rate in 
intronic sequences (Ki) as a proxy for neutral evolution. 
The method is based on the difference in the statistical 
features of the CDS and intron sequences and uses 
shuffling of the intron sequence alignments such that 
their statistical properties mimic those of the coding 
sequences. 

Potential driving forces of selection at synonymous sites 

The majority of recent studies of the RNA-level selection 
in mammals were performed by comparing evolutionary 
rates at synonymous sites with those for flanking introns 
or within the introns of neighboring chromosome 
regions. This approach avoids complications resulting 
from regional variations in mutation rates and 
transcription-related bias. As discussed previously, GC 
enrichment at the synonymous sites, as compared with 
intronic sequences, could indicate selection acting at 
these sites (34,35,204). Estimations of Ks for synonymous 
sites and Ki for intronic sequences performed by different 
authors significantly vary for different gene sequences and 
mammalian species (9), which may be due to the meth- 
odological difficulties in the determination of the RNA- 
level selection pressure (28). 

Notably, although the overall rates of nucleotide sub- 
stitutions at synonymous sites and for intronic sequences 
are quite similar, their patterns are dramatically different 
(53,205). For example, C residues are more common at the 
four-fold degenerate sites than in introns, and also are 
relatively less hkely to be associated with substitutions 
(10,53). This is dictated by the structure of the genetic 
code, where all codons with C at the second position are 
four-fold degenerate, which could be responsible not only 
for the strand asymmetry (206), but also for the more 
stable and ordered mRNA folding (10). Taking into 
account that C-rich sites in mRNA have a potential to 
interact with rRNA chngers in both prokaryotes and 
eukaryotes, the strand asymmetry may also indicate the 
RNA-level selection pressure to optimize translation levels 
of differently expressed proteins (14,107). 

Results of transcriptome-wide analysis of the human and 
mouse niRNA folding suggest that selection in favor of G 
and C may be operating on synonymous codons to 
maintain a more stable and ordered mRNA, which is 
likely important for transcript stability and translation 
(10). These data are in good agreement with theoretical 
predictions of the average coefficient of selection in favor 
of nucleotides G and C at human synonymous sites, which 
shows limited variation across individual sites (34). 
A plausible explanation for these results is synergistic epis- 
tasis (34,207,208), expected, for example, if synonymous 
sites are involved in maintaining the mRNA secondary 



structures (10,17,209) or are responsible for mRNA hy- 
bridization affinities and RNA interactions (107). 
Evolutionary rates at synonymous sites are dependent on 
the mutable CpG content. Rate of evolution at non-CpG 
synonymous sites is 10% below that of similar intron sites, 
whereas at postCpreG sites, it is 30% above that of simflar 
intron sites (34). From these data, a reasonable estimation 
of neutral divergence between two mammalian genomes 
(expressed as the mutation rates outside CpG context 
multiphed by the number of generations of their independ- 
ent evolution) can be approximated as ~1 . 1 times the Ks at 
non-CpG four-fold degenerate synonymous sites (34). 
Current estimations suggest that ~40-50% of synonymous 
positions in mammals have been opposed by selection 
(10,34,210), whereas at least half of them are under selec- 
tion in favor of more stable mRNA secondary structure. 

The nature of the driving forces of selection at mamma- 
han synonymous sites largely remains an open question. 
Different factors might contribute to negative and positive 
selection at synonymous sites, including gene function and 
expression patterns, codon bias, niRNA folding and 
stability (69). Analysis of associations between selection 
on synonymous positions, mRNA stabihty and expression 
revealed that the genes with positive selection at synonym- 
ous sites showed no correlation between Ks and Ka, 
indicating that evolution of synonymous sites in such 
genes is uncoupled from protein evolution. As discussed 
previously, significant negative correlation between Ks 
and expression in the group of genes under purifying 
selection indicates that highly expressed genes evolve 
slowly. Contrary, synonymous sites in the genes under 
positive selection show, on average, higher Ks in highly 
expressed genes, and a significantly lower mRNA stabihty, 
compared with the genes under negative selection. 
Notably, positive selection at synonymous sites of mam- 
mahan genes is substantially more common than positive 
selection on the protein sequences, and might act through 
mRNA destabilization affecting mRNA level and transla- 
tion (69). However, purifying-negative selection on syn- 
onymous sites is linked to elevated mRNA stabihty 
(10,89,102). 

RNA selection pressure affects the structure of functional 
domains and regulatory signals 

The RNA-level selection pressure in the protein-coding 
regions would have to be periodic, with periodicity of 
three nucleotides, and would not interfere with protein 
functional requirements (10,200). Codon usage bias 
could influence this periodicity, but other sources of bias 
are also important in mammals, such as mRNA stable 
secondary structure elements and sites interacting with 
rRNA chngers. Estimation of selection pressure associated 
with the maintenance of mRNA secondary structure or 
mRNA intermolecular interactions could be more 
accurate if analysis of stable hairpins, stem-loop structures 
and sites responsible for the RNA-RNA interactions 
would be conducted separately. All these classifications 
should be based on the experimental results produced by 
the new SHAPE approach (selective 2'-hydroxyl acylation 
analysed by primer extension) or similar techniques 
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(92,211) for the reliable estimation of the specific RNA- 
level selection. Although theoretic predictions of RNA 
secondary structures and sites of intermolecular inter- 
actions are in good agreement with experimentally 
produced classifications (10,92,107,110,211,212), h is 
more desirable to measure the RNA-level selection 
pressure based on reliable experimental data. 

What protein-coding regions could be under the strong 
RNA-level selection pressure? Comparison of 29 placen- 
tal mammalian species revealed ~ 10 000 highly conserved 
regions with extremely low rates of synonymous substi- 
tutions corresponding to overlapping functional signals, 
such as splicing regulatory elements and miRNA target 
sites, RNA secondary structure elements and dual-coding 
genes and enhancers (213). Numerous studies demon- 
strated that alternatively spliced regions slowly evolve 
in the flanking intronic regions and synonymous pos- 
itions near exon boundaries (28). These observations 
might be indicative of the differences between constitu- 
tive and alternative exons, which could be related to the 
variability in density and composition of the splicing 
regulatory signals that tend to reside near exon-intron 
boundaries (28). The lower GC content of alternative 
exons has been proposed as a support for translation 
selection (214). Analysis of evolutionary rates (Ks and 
Ki) at the exon-intron boundaries in the human 
OPRMl gene locus (215) showed that both alternative/ 
constitutive status of exon-intron boundaries and exon 
location (at the termini or core parts of a coding region) 
might affect the rate of evolution. The usage of certain 
codons is more biased near exon junctions, owing to sig- 
nificantly more common occurrence of the codon GAA 
in exonic splicing enhancers (145). Functional import- 
ance of such signals is exemplified by disease-related syn- 
onymous mutations that disrupt the splicing patterns and 
impair splicing regulation (9). 

Another protein-coding region under the strong RNA- 
level selection is a leader peptide, where RNA secondary 
structure is relaxed with specific local elements, 
compared with the downstream CDS (95,99). Several 
studies show that selection forces act almost uniformly 
to reduce the stability of mRNA at the beginning of 
protein-coding regions in different organisms 
(10,94,95,216). Relaxed mRNA secondary structures are 
characteristic for the start and stop codon regions, where 
they may facilitate initiation and termination of transla- 
tion (10,97). Thus, we can conclude that certain 
conserved protein-coding regions are under the strong 
RNA-level selection pressure. 

The evolutionary tradeoff between selective pressure acting 
at the RNA and protein levels 

How the RNA-level selection pressure affects non- 
synonymous positions is still an open question. Some 
evidence of the evolutionary tradeoff between selection 
pressure acting at the RNA and protein levels was found 
in viral genomes (211,212,217) and provided a better 
understanding of their evolution and variability. One 
interesting example is the HIV-1 RNA genome, the 
secondary structure of which has been experimentally 



determined (211). A correspondence was found between 
RNA and protein primary sequences as well as a correl- 
ation between high levels of RNA structure and sequences 
that encode inter-domain loops in HIV proteins. Analysis 
of this information led authors (212) to the conclusion 
that mRNA and protein structures do not evolve inde- 
pendently. A negative correlation exists between the 
extent of base-pairing in the RNA and amino acid vari- 
abihty. Relaxed mRNA secondary structures in the coding 
regions may favor the accumulation of genetic variation in 
proteins and, conversely, sequence changes driven by 
selection at the protein level may disrupt existing RNA 
structures. 

Another evidence of co-evolution of mRNA and 
protein structures emerged from the analysis of Ka/Ks 
and Ks values in mammals, where the positive correlation 
between these values is due to runs of adjacent 
substitutions (218). Strong positive correlation between 
Ka and Ks was found for the double mutations in the 
same codons in mammalian protein kinases genes (114), 
where in the majority of cases, one of the mutations is 
synonymous and the second is not (Figure 5). These sub- 
stitutions may reflect selection acting at both the nucleo- 
tide and protein levels. Obviously, such correlation may 
arise if synonymous and non-synonymous sites are parts 
of the same structure or same regulatory signal involved in 
the intra- or intermolecular interactions. Although a 
definite explanation of the reason for the positive correl- 
ation between Ks and Ka is still open (218), there is 
evidence to suggest that the evolutionary tradeoff 
between selection forces acting at the RNA and protein 
levels in mammals exists. 



CONCLUSIONS 

Synonymous nucleotide positions are essential for the 
maintenance and function of diverse regulatory signals 
located in the protein coding regions. There are several 
levels of punctuation complexity and biological signals 
encoded by mRNAs. A prominent punctuation signal is 
periodic pattern of RNA secondary structure, which 
provides for a more ordered and stable structure of tran- 
scripts in the protein-coding regions and may also support 
maintenance of the reading frame during translation 
(10,219). This basic pattern is overlaid by stable conserved 
RNA secondary structure elements (29,100,101) that 
may cause translation pausing or stalhng. The functional 
significance of synonymous positions for the maintenance 
of local stable RNA structures, which are crucial for 
protein regulation of expression, is weU recognized, espe- 
cially at the initiation of translation (10,89,92). These 
stable conserved folding elements, the second class of 
mRNA punctuation elements, could affect translation 
and, ultimately, the protein structure and function 
(10,11,29,103,106), whereas higher-order RNA structures 
may directly define protein folding, especially at domain 
junctions (63,64,211). Often this type of signals is located 
in the sequences encoding protein inter-domain loops, 
such as in the HIV genome (211). The third class of 
RNA punctuation signals are sites of intermolecular 
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Figure 5. Correlation between Ka and Ks for codons with single and double substitutions. Ka and Ks were estimated from sequence alignments of 
complete kinome (~600 orthologous human and mouse protein kinase mRNAs, analysed in 46) by PAML program (Yang et ah). Strong correlation 
between Ka and Ks was found for codons with double substitutions, where synonymous and non-synonymous substitutions occur (green dots). 
Image courtesy of Dr Aleksey Ogurtsov. 



interactions providing, for example, regulation of transla- 
tion (Shine-Dalgarno elements and sites interacting with 
rRNA dingers), splicing sites, and miRNA target sites 
(9,14,107,109,143). 

Ribosome pausing or stalling, caused by the secondary 
structures of messenger RNA or mRNA hybridization to 
rRNA, can affect a variety of co-translational processes, 
including protein folding and targeting (109). Direct 
base-pairing of mRNAs to rRNA chnger sites within ribo- 
somes may function as upregulating and downregulating 
elements (13), providing an additional mechanism of 
translational control. Most of these diverse RNA punctu- 
ation signals exist in both prokaryotes and eukaryotes, 
and enrich regulation of the translation efficiency and 
protein folding. 

The extraordinary complexity of transcriptomes that 
underpins the structural and functional diversity of mam- 
malian proteomes is created by alternative splicing and tran- 
scription with the use of distinct types of RNA splicing and 
regulatory control elements (5,6). Synonymous codon pos- 
itions allow further diversification of intra- and intermo- 
lecular niRNA hybridization affinity (128,129,131), 
creating previously unrecognized patterns of RNA punctu- 
ation and hidden language of mRNA-miRNA cross-talk, 
characteristic for the higher eukaryotes and responsible for 
the regulation of the biological complexity, tissue-specific 
and condition-specific expression (15). 



In the past, transcriptomes have been mostly charac- 
terized by transcript sequences and expression levels. The 
recent progress in experimental techniques (SHAPE, 
PARS), together with improved computational prediction 
methods, has enabled genome-wide measurements of RNA 
structure and has provided the first picture of the structural 
organization of prokaryotic and eukaryotic transcriptomes 
(92,211,220,221). With further progress in method refine- 
ment and interpretation, structural views of the transcrip- 
tome should provide new approaches for the estimation of 
the RNA level of selection pressure, identification and val- 
idation of regulatory RNA patterns and new punctuation 
signals that are involved in diverse cellular processes, and 
thereby increase understanding of RNA function. 
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